Morphotonology for TTS in Niger-Congo languages

نویسنده

Dafydd Gibbon

چکیده

It is well–known that many East Asian languages have lexical (i.e. phonemic) prosody, and languages such as Mandarin are very well described. African languages are also frequently mentioned in the literature as tone languages, and phonetic interface patterns such as downstep are well–documented. It is less well– known that the functionality of tone patterning in African tone languages is fundamentally morphosyntactic rather than phonemic, in that (a) tonal patterning is specific to particular parts of speech, (b) tones may have inflectional function and play a role in both (c) derivational and (d) compounding word formation patterns and (e) in marking syntactic phrasal templates. The aim of this paper is both to document the morphosyntactic functionality of tones in African languages within a typological context as compared to East Asian tone languages such as Mandarin, and to develop finite state architectures for tone handling in practical Text-To-Speech synthesis in health and agriculture information projects in Ivory Coast and Nigeria. Morphosyntactic tonal functionality is illustrated for Ibibio (Lower Cross Niger-Congo, South-Eastern Nigeria), but also applies to other Western and Central African languages. 1. Forms and functions in pitch systems The theoretical objective of this study is to place the typological characteristics of the tonal systems of Niger-Congo languages, and possibly also of Bantu languages, within a clear framework for pitch system typology. The term ‘pitch’ is used deliberately in order to ground the analysis empirically in a clear phonetic domain; this is not possible with generally rather ill–defined notions such as ‘intonation’ and ‘tone’. The operational objective is to provide a basis for incorporating the complex pitch systems of Western and Central African languages, which differ greatly from the better known intonation-accent languages of Europe and the lexical tone languages of South-East Asia, into speech synthesis systems for practical deployment in health and agricultural information and education projects. Attention is restricted for reasons of space to Ibibio (Lower Cross, Niger– Congo, Nigeria) andMandarin, as typical representatives of two major ‘lexical tone’ language types. 1.1. Descriptive preliminaries In a cross-linguistic perspective, both the organisation of pitch forms and their functionality are highly versatile, and can be summarised in a straightforward manner with a Jespersenian functionalistic rank–based language architecture [Jespersen 1924]; cf. Table 1. Pitch functionality at the phonological and morphological ranks (which determine word prosody) is usually referred to as tone, and at the syntactic, text and dialogue levels (which determine discourse prosody) as intonation. Table 1: Pitch–functional rank levels. Phonology phonemic ‘lexical tone’ Morphology morphemic sub–rank: morphophonemic ‘lexical tone’ derived word sub–rank: morphological templatic tone: compound word sub–rank: tonal interfixation inflectional sub–rank: morphosyntax Syntax phrase sub–rank: templatic phrasal intonation sentence sub–rank: sentence intonation: phrasing, accentuation, nucleus Text textual ‘paragraph’ intonation: cohesive pitch contours, focal and contrastive accentuation Dialogue dialogue control, emotion 1.2. Formal preliminaries It is known that—at least for the organisation of forms—the basic structure of pitch systems can be modelled by regular (linear) languages and regular (linear) grammars (equivalently: finite state automata, FSAs). Finite State (FS) modelling holds even where hierarchies are involved in the organisation of pitch systems: the hierarchies put forward so far are either of finite depth or are purely right–branching or left–branching (but cf. [Steedman 1991]) and thus formally FS–equivalent. Explicit applications of FSAs to modelling intonation forms have been available since [Reich 1969], the 1970s IPO model [’t Hart & Cohen 1973], [Pierrehumbert 1980] and [Gibbon 1981]. Implicitly, many other intonation models, including [Fujisaki 1988], are also FS models. In this paper, FS intonation patterning will not be dealt with further except in the context of interface issues. FST modelling of the tone–phonetics interface started with a model of two Niger–Congo languages (Baule, Kwa; Tem, Gur) in [Gibbon 1987] (cf. also [Gibbon 2001]). The technique was extended to Mandarin tonal sandhi, mapping lexical tone sequences to other lexical tone sequences, in [Jansche 1998]. These models will be discussed below. 2. Finite–State tonal interfaces 2.1. FS syntax–prosody interface (intonation) Basic FS techniques have also been used for modelling the syntax–prosody interface. Perhaps the first contribution to FS syntax–prosody interface modelling was hinted at in [Chomsky 1965] in a discussion of criteria for performance models, where he points out that right–branching sentences such as I called up the man who wrote the book that you told Speech Prosody 2006 Dresden, Germany May 2-5, 2006 ISCAArchive http://www.isca-speech.org/archive Speech Prosody 2006, Dresden, Germany, May2-5, 2006 me about are more acceptable than centre–embedded sentences such as The man who the boy who the students recognised pointed out is a friend of mine (p. 10f.). In Chomsky’s later work prosodic patterns were derived from arbitrarily embedded nested structures, leading to unrealistic numbers of distinctions between ‘stress’ levels [Chomsky & Halle 1968]. The first explicit discussion of the syntax–prosody relation in FS terms is by Reich [Reich 1969], who criticises Chomsky’s use of overly complex grammar types, and proposes: 1. that English syntax tolerates only one level of centre– embedding and otherwise left and right branching, 2. that therefore FS devices are adequate for modelling English syntax, and 3. that the essentially iterative structure is marked by intonation (p. 840): “[...] the taking of a loop [...] is marked by a distinctive intonation pattern, the rising non–terminal contour. [...] When the loop is not taken, the characteristic falling terminal contour results.’ Reich thus models intonation patterns of the type rise∗ rise|fall (right or left branching) for cases such as the following: 1. Right-branching: The cow tossed the dog that worried the cat that killed the rat that ate the malt 2. Left–branching: Henry is Doug’s father’s second wife’s sister’s daughter’s husband The distinction was further developed in the ‘readjustment rule discussion’ (an older term for the syntax–prosody interface discussion), particularly in [Bierwisch 1966], who first pointed out the need for flattening syntactic structures, and then in [Culicover & Rochemont 1983]. These approaches developed algorithms for flattening syntactic structures. Many applications of this ‘flatter prosody principle’ to speech synthesis have been made, e.g. [Campbell 1993] and [Wagner 2000], the latter explicitly using FS methods. 2.2. Tonemic sandhi interface (tone) Figure 1: Mandarin tone FST (from [Jansche 1998]). Taking a new approach to the computational prosody of Mandarin, Jansche presents a Finite State Transducer (FST) model of tone sandhi in the Tianjin variety, shown in Figure 1. The tone sequences of canonical lexical forms are mapped to other tone sequences; both input and output tones are from the lexical tone inventory. Mandarin sandhi mapping is therefore not a phonetic mapping to allotones, but substitutes lexical phonemic tones in the environment of other lexical phonemic tones. Formally, the Tianjin Mandarin FST can be analysed into a set of relatively unrelated and idiosyncratic FSTs. The overall FST models the union of the regular relations which are modelled by the individual FSTs [Kaplan & Kay 1994]. 2.3. Morphotonemic–phonetic interface (tone) Figure 2: (a) Basic 2–tone Niger-Congo FST; (b) Generalisation of tone FST mapping types ([Gibbon 2001]). Figure 3: Variants: Baule FST (with lookahead) and 3-tone FST. The complex tonal structure and functionality of Western and Central African languages has often appeared in the literature: tone terracing vs. discrete level tone patterns, automatic and lexical downstep, upstep, downdrift and upsweep, tonal blocking, tone–depressor consonants. Of specific interest here is tone terracing. Tone terracing is a phonemic-phonetic mapping, and was modelled in Metrical Phonology by right– branching trees [Clements 1981]. Since right–branching trees are accessible to FS modelling, [Gibbon 1987] concluded that tone terrace mapping is formalisable with FSTs, and provided FST models of terracing in Baule (Kwa, Ivory Coast; cf. Figure 3(a) for a locally non–deterministic FST with 1-place lookahead) and in Tem (Gur, Togo). In [Gibbon 2001] these models were generalised to a schema for any two–tone terraced system; cf. Figure 2(b). Figure 3(b) shows the simple FST model required for 3–tone discrete–level tone systems. Formally, the basic two–tone FST is the union of just two simpler FSTs, one starting with high tones and one with low tones. The simpler FSTs are isomorphic but for the labelling. The topology of the African tone FSTs shown in Figures 2 and 3 has very general and symmetrical properties, and is thus quite different from the more idiosyncratic topology of the Mandarin tone FST of Figure 1, in addition to the difference in levels of representation. It should be noted that this kind of explicit tone grammar modelling has not yet found its way into tone description in ‘mainstream phonology’ (cf. the Handbook contributions of Odden and Yip in [Goldsmith 1995], and many later publications). 2.4. Phonetic–acoustic interface (tone) For modelling the pitch time function for high and low tone sequences, an asymptotic function similar to that of [Liberman & Pierrehumbert1984] was used (the reference unit for pitch association is the syllable): pitchi+1 = tone ∗ (pitchi − baseline) + baseline with speaker–specific initial, baseline and tone values, where tone<1 for low tones, tone>1 for high tones. The model is linear and local. Using the FSTs shown in Figures 2 and 3 it is straightforward to implement a transducer in which the phonetic output symbols are replaced with the appropriately instantiated numerical functions. A ‘toy instantiation’ can be illustrated informally (not in the detail required for Ibibio) as follows: High tone factor: 1.1 Low tone factor: 0.8 Downstep factor: 1.3 Upstep factor: 0.6 Baseline component 100 Hz Initial high component: 80 Hz Initial low component: 80 Hz Input (tones): H L H L Output (Hz): 180 148 162 137 The ‘real world’ empirical basis for the actual Ibibio model was induced by an exhaustive prosodic data mining algorithm applied to Ibibio data [Gibbon, Urua & Gut2003]. 3. Morphophonemic and morphosyntactic tone The previous section dealt with form–form interfaces; the present section deals with morphotonology at the syntax– morphology interface. Such factors are frequently referred to in passing in the literature, but, like metrical trees, never provided with a grammar model. The morphotonological factors will be illustrated with Ibibio, South-Eastern Nigeria, with the fourth largest Nigerian language population, classified as a Lower Cross language. The factors involved in Ibibio morphotonology (simplifying for brevity of presentation) are: 1. Part of Speech (POS): there are four tonological categories determined by POS in Ibibio [Essien 1990, Urua 2000]: (a) Nouns: lexical tone with phonemic functionality, comparable with tone in East Asian languages; òbû ‘crayfish’ óbû ‘dust’. (b) Verbs: Fixed tonal templates, modifiable by inflexion and verb subcategorisation. (c) Autonomous tonal function morphemes: HL meaning ‘proximate future/past’ and LH meaning ‘non–proximate future/past’, with the tense prefixes yaa and maa, respectively, e.g. n-yaa-ka ‘I will go (sometime)’ (d) Composition template morphemes: in word– formation, superimposed patterns which function as ‘interfixes’: èno ‘gift’, àbàsı̀ ‘God’ form a compound: ènò + high Tone + àbàsı̀→ ènòábàsı̀ (cf. the interfix function in German: Liebesbrief ). (e) Templatic function words and affixes: determiners, which are NP-initial, tend to have the same pattern, high–low, while quantifiers, which are NP-final, tend to have the opposite pattern, low– high. Figure 4: Ibibio Noun Phrase FST. The autonomous tonal function morphemes and the tones of templatic function words and affixes are particularly interesting from the FS processing point of view: the sub–sentential structures which they mark are non–hierarchical, and can be modelled with FS devices; cf. Figure 4. The symbols in the morphosyntactic FST stand for pairs of a part of speech (POS) and the appropriate set of tones for each POS. Consequently, a full description of tonal morphosyntax requires three aligned levels of representation (tiers, tapes). This architecture has not yet been implemented, but two models are being considered: either Kay’s multi–tape FSTs for Arabic templatic morphology [Kay 1987], or a cascade of a morphosyntactic FST as in Figure 4 and a terracing FST as in Figure 2, which can be composed for efficiency into a single FST [Kaplan & Kay 1994].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TheMacro-Sudan Belt and Niger-Congo Reconstruction

Basing himself largely on areal and typological arguments, Güldemann (2010) claims that neither Proto-Niger-Congo nor Proto-Bantu had more than a “moderate” system of derivational verb suffixes (“extensions”), and that both proto languages lacked inflectional verb prefixes. Although drawing largely on the same materials as Hyman (2004, 2007a,b), he arrives at the opposite conclusion that Niger-...

متن کامل

How to become a “Kwa” noun

An important problem of comparative Niger-Congo morphology is understanding the processes that relate word structures in languages of the isolating “Kwa” type to those of the agglutinating “Bantu” type. A salient sub-problem of this larger morphological puzzle is charting the connection between the noun class systems of the Kwa-type languages which, at one extreme, can lack such classes entirel...

متن کامل

The Fulani are not from the Middle East.

In a recent issue of PNAS, Scheinfeldt et al. (1) maintained that, although Fulani mtDNA is consistent with a West African origin, the linguistic and nonrecombinant portion of the Y chromosome (NRY) supports a Middle Eastern origin for this population. Although this is their opinion, the linguistic and genetic evidence fails to support this conclusion. The Fulani speak a Niger-Congo language. T...

متن کامل

The Emergence of Tense in Early Bantu

Examination of a set of non-Bantu Niger-Congo languages shows that most are aspect-prominent languages, that is, they either do not encode tense —the majority case— or, as the quotation indicates, there is reason to think that some have added tense to an original aspectual base. Comparative consideration of tense-aspect categories and morphology suggests that early and Proto-Niger-Congo were as...

متن کامل

Y-chromosomal variation in sub-Saharan Africa: insights into the history of Niger-Congo groups.

Technological and cultural innovations as well as climate changes are thought to have influenced the diffusion of major language phyla in sub-Saharan Africa. The most widespread and the richest in diversity is the Niger-Congo phylum, thought to have originated in West Africa ∼ 10,000 years ago (ya). The expansion of Bantu languages (a family within the Niger-Congo phylum) ∼ 5,000 ya represents ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Morphotonology for TTS in Niger-Congo languages

نویسنده

چکیده

منابع مشابه

TheMacro-Sudan Belt and Niger-Congo Reconstruction

How to become a “Kwa” noun

The Fulani are not from the Middle East.

The Emergence of Tense in Early Bantu

Y-chromosomal variation in sub-Saharan Africa: insights into the history of Niger-Congo groups.

عنوان ژورنال:

اشتراک گذاری